Overview

Dataset statistics

Number of variables12
Number of observations19681
Missing cells0
Missing cells (%)0.0%
Duplicate rows6
Duplicate rows (%)< 0.1%
Total size in memory1.8 MiB
Average record size in memory96.0 B

Variable types

NUM8
CAT4

Warnings

Dataset has 6 (< 0.1%) duplicate rows Duplicates
user_id has a high cardinality: 19675 distinct values High cardinality
devicebrand has a high cardinality: 100 distinct values High cardinality
devicebrand is highly correlated with attr_os_strHigh correlation
attr_os_str is highly correlated with devicebrandHigh correlation
avg_revenue is highly skewed (γ1 = 26.47550501) Skewed
cnt_call is highly skewed (γ1 = 30.02842689) Skewed
cnt_dis is highly skewed (γ1 = 47.58521106) Skewed
cnt_add_ons is highly skewed (γ1 = 47.99302987) Skewed
user_id is uniformly distributed Uniform
cnt_call has 18800 (95.5%) zeros Zeros
cnt_dis has 18681 (94.9%) zeros Zeros
cnt_mobile has 1472 (7.5%) zeros Zeros
cnt_internet has 18713 (95.1%) zeros Zeros
cnt_tv has 16062 (81.6%) zeros Zeros
cnt_voice has 19260 (97.9%) zeros Zeros
cnt_add_ons has 10600 (53.9%) zeros Zeros

Reproduction

Analysis started2020-12-13 17:09:14.230245
Analysis finished2020-12-13 17:09:39.845533
Duration25.62 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

user_id
Categorical

HIGH CARDINALITY
UNIFORM

Distinct19675
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size153.8 KiB
2917446999
 
2
2514811034
 
2
3103546688
 
2
TMCZ_9613733
 
2
100487292
 
2
Other values (19670)
19671 
ValueCountFrequency (%) 
29174469992< 0.1%
 
25148110342< 0.1%
 
31035466882< 0.1%
 
TMCZ_96137332< 0.1%
 
1004872922< 0.1%
 
TMCZ_60095848992< 0.1%
 
955718ff-519a-48ed-8f49-0fd7a76c741b1< 0.1%
 
TMCZ_60075611441< 0.1%
 
23437431< 0.1%
 
32861750111< 0.1%
 
Other values (19665)1966599.9%
 
2020-12-13T18:09:40.173099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique19669 ?
Unique (%)99.9%
2020-12-13T18:09:40.451355image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length75
Median length10
Mean length15.2909405
Min length4

avg_revenue
Real number (ℝ≥0)

SKEWED

Distinct17615
Distinct (%)89.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.33600217
Minimum0
Maximum5885.225
Zeros10
Zeros (%)0.1%
Memory size153.8 KiB
2020-12-13T18:09:40.762520image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.4689
Q110.0833
median22.53333333
Q346.6347
95-th percentile120.8426667
Maximum5885.225
Range5885.225
Interquartile range (IQR)36.5514

Descriptive statistics

Standard deviation99.50439393
Coefficient of variation (CV)2.466887856
Kurtosis1072.73635
Mean40.33600217
Median Absolute Deviation (MAD)15.00273333
Skewness26.47550501
Sum793852.8588
Variance9901.124411
MonotocityNot monotonic
2020-12-13T18:09:41.000941image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
9.51200.6%
 
11.4770.4%
 
6.6530.3%
 
5500.3%
 
5.5400.2%
 
10360.2%
 
19310.2%
 
7.8280.1%
 
5.7280.1%
 
3.9260.1%
 
Other values (17605)1919297.5%
 
ValueCountFrequency (%) 
0100.1%
 
0.00073333333331< 0.1%
 
0.00211< 0.1%
 
0.0025142857141< 0.1%
 
0.00391< 0.1%
 
ValueCountFrequency (%) 
5885.2251< 0.1%
 
3912.636251< 0.1%
 
3801.8011< 0.1%
 
3222.7411< 0.1%
 
2977.5190911< 0.1%
 

nc
Categorical

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size153.8 KiB
ro
8890 
cz
3040 
hr
2935 
pl
1621 
sk
903 
Other values (5)
2292 
ValueCountFrequency (%) 
ro889045.2%
 
cz304015.4%
 
hr293514.9%
 
pl16218.2%
 
sk9034.6%
 
mk8784.5%
 
hu7503.8%
 
me5032.6%
 
at910.5%
 
heyah700.4%
 
2020-12-13T18:09:41.217844image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T18:09:41.509023image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:41.859087image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length2
Mean length2.01067019
Min length2

attr_os_str
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size153.8 KiB
ANDROID
17768 
IOS
1913 
ValueCountFrequency (%) 
ANDROID1776890.3%
 
IOS19139.7%
 
2020-12-13T18:09:42.033976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T18:09:42.158192image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:42.287847image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length7
Median length7
Mean length6.611198618
Min length3

devicebrand
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct100
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size153.8 KiB
samsung
9785 
HUAWEI
4192 
Apple
1913 
xiaomi
 
665
Xiaomi
 
509
Other values (95)
2617 
ValueCountFrequency (%) 
samsung978549.7%
 
HUAWEI419221.3%
 
Apple19139.7%
 
xiaomi6653.4%
 
Xiaomi5092.6%
 
HONOR3491.8%
 
Redmi3291.7%
 
motorola2631.3%
 
Nokia2351.2%
 
Sony2211.1%
 
Other values (90)12206.2%
 
2020-12-13T18:09:42.529165image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique32 ?
Unique (%)0.2%
2020-12-13T18:09:42.761542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length11
Median length7
Mean length6.306590112
Min length2

cnt_call
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct114
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.130328743
Minimum0
Maximum705
Zeros18800
Zeros (%)95.5%
Memory size153.8 KiB
2020-12-13T18:09:43.010550image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum705
Range705
Interquartile range (IQR)0

Descriptive statistics

Standard deviation11.43979625
Coefficient of variation (CV)10.12076913
Kurtosis1367.630001
Mean1.130328743
Median Absolute Deviation (MAD)0
Skewness30.02842689
Sum22246
Variance130.8689384
MonotocityNot monotonic
2020-12-13T18:09:43.208335image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01880095.5%
 
3590.3%
 
1590.3%
 
6480.2%
 
4460.2%
 
5430.2%
 
2410.2%
 
8350.2%
 
7290.1%
 
11280.1%
 
Other values (104)4932.5%
 
ValueCountFrequency (%) 
01880095.5%
 
1590.3%
 
2410.2%
 
3590.3%
 
4460.2%
 
ValueCountFrequency (%) 
7051< 0.1%
 
5791< 0.1%
 
4711< 0.1%
 
3641< 0.1%
 
3431< 0.1%
 

cnt_dis
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct713
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.72354047
Minimum0
Maximum35711
Zeros18681
Zeros (%)94.9%
Memory size153.8 KiB
2020-12-13T18:09:43.449654image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile17
Maximum35711
Range35711
Interquartile range (IQR)0

Descriptive statistics

Standard deviation544.0456197
Coefficient of variation (CV)12.73409492
Kurtosis2849.520163
Mean42.72354047
Median Absolute Deviation (MAD)0
Skewness47.58521106
Sum840842
Variance295985.6363
MonotocityNot monotonic
2020-12-13T18:09:43.723921image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01868194.9%
 
1186< 0.1%
 
825< 0.1%
 
2195< 0.1%
 
755< 0.1%
 
1764< 0.1%
 
5704< 0.1%
 
724< 0.1%
 
3464< 0.1%
 
854< 0.1%
 
Other values (703)9594.9%
 
ValueCountFrequency (%) 
01868194.9%
 
23< 0.1%
 
31< 0.1%
 
71< 0.1%
 
81< 0.1%
 
ValueCountFrequency (%) 
357111< 0.1%
 
356161< 0.1%
 
341531< 0.1%
 
221641< 0.1%
 
161491< 0.1%
 

cnt_mobile
Real number (ℝ≥0)

ZEROS

Distinct32
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.015344749
Minimum0
Maximum50
Zeros1472
Zeros (%)7.5%
Memory size153.8 KiB
2020-12-13T18:09:44.007161image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q33
95-th percentile5
Maximum50
Range50
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.802174695
Coefficient of variation (CV)0.8942265071
Kurtosis64.51463549
Mean2.015344749
Median Absolute Deviation (MAD)1
Skewness4.742233027
Sum39664
Variance3.247833632
MonotocityNot monotonic
2020-12-13T18:09:44.235551image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%) 
1846843.0%
 
2431221.9%
 
3256313.0%
 
014727.5%
 
414517.4%
 
57263.7%
 
62961.5%
 
71820.9%
 
8680.3%
 
9440.2%
 
Other values (22)990.5%
 
ValueCountFrequency (%) 
014727.5%
 
1846843.0%
 
2431221.9%
 
3256313.0%
 
414517.4%
 
ValueCountFrequency (%) 
501< 0.1%
 
381< 0.1%
 
361< 0.1%
 
341< 0.1%
 
281< 0.1%
 

cnt_internet
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05700929831
Minimum0
Maximum16
Zeros18713
Zeros (%)95.1%
Memory size153.8 KiB
2020-12-13T18:09:44.427042image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum16
Range16
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2886782649
Coefficient of variation (CV)5.06370493
Kurtosis505.2999286
Mean0.05700929831
Median Absolute Deviation (MAD)0
Skewness13.10083127
Sum1122
Variance0.08333514061
MonotocityNot monotonic
2020-12-13T18:09:44.604613image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
01871395.1%
 
18514.3%
 
21000.5%
 
3110.1%
 
43< 0.1%
 
52< 0.1%
 
161< 0.1%
 
ValueCountFrequency (%) 
01871395.1%
 
18514.3%
 
21000.5%
 
3110.1%
 
43< 0.1%
 
ValueCountFrequency (%) 
161< 0.1%
 
52< 0.1%
 
43< 0.1%
 
3110.1%
 
21000.5%
 

cnt_tv
Real number (ℝ≥0)

ZEROS

Distinct14
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4159341497
Minimum0
Maximum14
Zeros16062
Zeros (%)81.6%
Memory size153.8 KiB
2020-12-13T18:09:44.824974image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3
Maximum14
Range14
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.144782328
Coefficient of variation (CV)2.752316271
Kurtosis17.71448603
Mean0.4159341497
Median Absolute Deviation (MAD)0
Skewness3.816086838
Sum8186
Variance1.310526578
MonotocityNot monotonic
2020-12-13T18:09:44.992560image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%) 
01606281.6%
 
117338.8%
 
27573.8%
 
34362.2%
 
42731.4%
 
51971.0%
 
61070.5%
 
7650.3%
 
8260.1%
 
9110.1%
 
Other values (4)140.1%
 
ValueCountFrequency (%) 
01606281.6%
 
117338.8%
 
27573.8%
 
34362.2%
 
42731.4%
 
ValueCountFrequency (%) 
141< 0.1%
 
122< 0.1%
 
112< 0.1%
 
109< 0.1%
 
9110.1%
 

cnt_voice
Real number (ℝ≥0)

ZEROS

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02342360652
Minimum0
Maximum9
Zeros19260
Zeros (%)97.9%
Memory size153.8 KiB
2020-12-13T18:09:45.143122image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum9
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1731759933
Coefficient of variation (CV)7.393224999
Kurtosis417.0160229
Mean0.02342360652
Median Absolute Deviation (MAD)0
Skewness13.26119923
Sum461
Variance0.02998992466
MonotocityNot monotonic
2020-12-13T18:09:45.256818image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
01926097.9%
 
13902.0%
 
2280.1%
 
32< 0.1%
 
91< 0.1%
 
ValueCountFrequency (%) 
01926097.9%
 
13902.0%
 
2280.1%
 
32< 0.1%
 
91< 0.1%
 
ValueCountFrequency (%) 
91< 0.1%
 
32< 0.1%
 
2280.1%
 
13902.0%
 
01926097.9%
 

cnt_add_ons
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct192
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.349778975
Minimum0
Maximum2670
Zeros10600
Zeros (%)53.9%
Memory size153.8 KiB
2020-12-13T18:09:45.440328image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q35
95-th percentile30
Maximum2670
Range2670
Interquartile range (IQR)5

Descriptive statistics

Standard deviation28.46838937
Coefficient of variation (CV)4.483366977
Kurtosis4030.342506
Mean6.349778975
Median Absolute Deviation (MAD)0
Skewness47.99302987
Sum124970
Variance810.4491932
MonotocityNot monotonic
2020-12-13T18:09:45.732548image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01060053.9%
 
116278.3%
 
210705.4%
 
37794.0%
 
46413.3%
 
55272.7%
 
64392.2%
 
73461.8%
 
83131.6%
 
92751.4%
 
Other values (182)306415.6%
 
ValueCountFrequency (%) 
01060053.9%
 
116278.3%
 
210705.4%
 
37794.0%
 
46413.3%
 
ValueCountFrequency (%) 
26701< 0.1%
 
9611< 0.1%
 
7611< 0.1%
 
6841< 0.1%
 
5991< 0.1%
 

Interactions

2020-12-13T18:09:21.109305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:21.340825image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:21.573200image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:21.844477image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:22.259663image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:22.532933image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:22.817173image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:23.047667image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:23.231881image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:23.493133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:23.746454image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:24.123281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:24.393281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:24.697467image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:24.949281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:25.121862image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:25.306344image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:25.980176image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:26.330251image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:26.613539image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:26.861891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:27.149493image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:27.406758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:27.652103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:27.888469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:28.050467image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:28.314226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:28.561568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:28.790952image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:28.973512image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:29.148708image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:29.321211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:29.556583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:29.993990image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:30.204235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:30.489472image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:30.761745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:30.983560image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:31.281616image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:31.551892image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:31.808206image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:32.116020image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:32.359713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:32.669881image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:32.901045image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:33.092891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:33.282044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:33.517414image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:34.000165image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:34.157366image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:34.513412image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:34.772719image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:34.993369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:35.871521image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:36.185273image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:36.364796image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:36.593184image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:36.847502image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:37.220432image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:37.479734image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:37.724079image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:37.951298image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:38.302966image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:38.647049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-12-13T18:09:45.943980image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-12-13T18:09:46.191711image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-12-13T18:09:46.486923image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-12-13T18:09:46.828971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-12-13T18:09:47.103271image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-12-13T18:09:39.072615image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-13T18:09:39.549278image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

user_idavg_revenuencattr_os_strdevicebrandcnt_callcnt_discnt_mobilecnt_internetcnt_tvcnt_voicecnt_add_ons
0000fa182-b7ab-4856-9c5e-a6370138597925.364000mkANDROIDsamsung0.00.01.00.01.00.02.0
1001b4889-c295-4ab5-a7fb-c636ece808902.000000atIOSApple0.00.01.00.00.00.00.0
2001db037-2940-4e66-abab-135af3338eec18.241600hrANDROIDsamsung0.00.02.00.00.00.01.0
3002c732e-63bb-4d24-be14-51d6415e7b8a15.428400hrANDROIDsamsung0.00.01.00.00.00.00.0
4002c968a-d3e6-428b-be16-36991bae18760.555556meANDROIDHONOR0.00.01.00.00.00.01.0
5003a2fea-9e23-48b7-a704-6f5c3ae2d6a435.491820hrANDROIDsamsung0.00.01.01.01.01.01.0
600758022-9b70-441d-a317-d1c502b0f03a142.255100hrANDROIDsamsung0.00.05.00.00.00.03.0
7008582ed-314a-4abb-b9d5-c95c647eae1133.215000meANDROIDsamsung0.00.01.00.00.00.00.0
80087ac6adda43eb8b5bf62b0e1bd9b8dac25030ea15344ea56ed9afb26f6dc2e5.345000skANDROIDsamsung0.00.01.00.00.00.01.0
900968692-5460-4ea8-9a8e-a93000d0ebef71.436000mkANDROIDsamsung0.00.03.00.00.00.03.0

Last rows

user_idavg_revenuencattr_os_strdevicebrandcnt_callcnt_discnt_mobilecnt_internetcnt_tvcnt_voicecnt_add_ons
19671ff1a37bd-8a60-45db-8522-b5d02981f49194.110250hrANDROIDHUAWEI0.00.01.00.00.00.012.0
19672ff2a6ca3-efd4-4480-85a6-ec6de935e0d113.804227hrANDROIDhtc0.00.02.01.01.01.03.0
19673ff3fa14e-5051-469a-9db2-a78601531f12337.584000mkANDROIDsamsung0.00.03.01.01.01.01.0
19674ff4637aa73f626ce2fa617ffad4b825a8f975f0d78cdb7914da23e800140da4a161.564000skANDROIDsamsung0.00.06.00.00.00.05.0
19675ff46d44c-ee1b-4f7d-bd95-333abac5ad4188.450000meANDROIDsamsung0.00.05.00.01.00.02.0
19676ff87dff2-f330-4f0e-b404-ab2c008578d93.200000mkANDROIDHUAWEI0.00.01.00.00.00.02.0
19677ffa08ebc-e4bb-46ea-ae48-1cc0b5a1b61246.659513hrANDROIDXiaomi0.00.03.00.00.00.06.0
19678ffa1111d-1406-4bc5-ad2f-d57c119de744108.836000hrANDROIDsamsung0.00.01.00.00.00.00.0
19679ffb25cab-d77b-48b9-8a1d-093a894f1d4d36.926500hrANDROIDsamsung0.00.01.00.00.00.03.0
19680ffbd7171-b306-4f19-9077-ac6300c3deb93.200000mkANDROIDHUAWEI0.00.01.00.00.00.00.0

Duplicate rows

Most frequent

user_idavg_revenuencattr_os_strdevicebrandcnt_callcnt_discnt_mobilecnt_internetcnt_tvcnt_voicecnt_add_onscount
010048729246.601818skANDROIDXiaomi0.00.02.00.00.00.06.02
125148110341.542975roANDROIDsamsung0.00.01.00.00.00.00.02
2291744699924.772364roANDROIDsamsung0.00.00.00.02.00.00.02
3310354668861.518712roANDROIDHUAWEI0.00.03.00.00.00.00.02
4TMCZ_600958489958.044620czANDROIDHONOR0.00.01.00.00.00.00.02
5TMCZ_96137337.695000czANDROIDsamsung0.00.04.00.01.00.00.02